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Abstract 

In this paper we propose a new design criterion and a new class 
of unitary signal constellations for differential space-time modulation 
for multiple-antenna systems over Rayleigh flat-fading channels with 
unknown fading coefficients. Extensive simulations show that the new 
codes have significantly better performance than existing codes. We 
have compared the performance of our codes with differential detection 
schemes using orthogonal design, Cayley differential codes, fixed-point- 
free group codes and product of groups and for the same bit error rate, 
our codes allow smaller signal to noise ratio by as much as 10 dB. 

The design of the new codes is accomplished in a systematic way 
through the optimization of a performance index that closely describes 
the bit error rate as a function of the signal to noise ratio. The new 
performance index is computationally simple and we have derived an- 
alytical expressions for its gradient with respect to constellation pa- 
rameters. 

Decoding of the proposed constellations is reduced to a set of one- 
dimensional closest point problems that we solve using parallel sphere 
decoder algorithms. This decoding strategy can also improve efficiency 
of existing codes. 

1 Introduction 

Recently there have been extensive research interests in wireless communi- 
cation links with multiple transmitter antennas. For the Rayleigh-fading 
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channel models, information-theoretic analysis has shown that the capacity 
of a communication link with multiple transmitter antennas can substan- 
tially exceed that of a single-antenna link [10] , [11] , [31] , [36] , [55] . Several 
coding and modulation schemes have also been proposed to exploit the po- 
tential increase in the capacity through space diversity. For the coherent 
multiple-antenna channel, several transmit diversity methods and code con- 
struction have been presented in [3], [12], [33] and the references therein 
(see, e.g., [7], [lj, [T3], [17], [M]-[36], [38], [37], [50]-[53]). In particular, 
Tarokh, Seshadri, and Calderbank [32] proposed space-time codes which 
combine signal processing at the receiver with coding techniques appropri- 
ate to multiple transmitter antennas. Alamouti [3] discovered a remarkable 
transmitter diversity scheme for two transmitter antennas, which was later 
generalized by Tarokh et al. [43] as a framework for space-time block codes. 
Motivated by the fact that, in many situations, channel state information 
may not be available to the receiver, Hochwald and Marzetta [21] proposed a 
general signaling scheme, called unitary space-time modulation, and showed 
that this scheme can achieve a high ratio of channel capacity in combination 
with channel coding. The design of unitary space-time constellations was 
investigated in pQ, [18] and [19]. More recently, differential modulation and 
code construction methods for multiple transmit antennas have been pro- 
posed by Hochwald et al. [20J, Hughes [22], Tarokh et al. [251 HI] and some 
other researchers (see, e.g., @], [16], [23], [24], [27], [30], [32], [40l [41], [49], 
M)- 

We investigate the encoding and decoding issues for the differential uni- 
tary space-time modulation scheme independently proposed by Hochwald 
and Sweldens in [20J and Hughes in [22j. A number of unitary space-time 
codes have been proposed aimed at achieving high performance, low encod- 
ing and decoding complexity. Among these, we recall the orthogonal design 
(see, [251133]), cyclic group codes [201 [22], Caley differential (CD) codes [E] 
and the full-diversity codes such as fixed-point-free (FPF) unitary group 
codes G mjr , non-group codes 5 m>s and products of cyclic groups [39J. Or- 
thogonal design has extremely low decoding complexity; unfortunately, the 
performance degrades significantly when the number of receiver antennas is 
more than one or the data rate is high. Caley differential codes and the 
full-diversity codes outperform orthogonal designs in many cases, while the 
decoding complexity is much higher than that of orthogonal designs. The 
main idea of decoding the full-diversity codes and Caley differential codes is 
to formulate the decoding problem as a closest point problem and then solve 
it by existing methods such as "LLL" lattice algorithm and sphere decoder 
algorithm. The decoding complexity depends critically on the dimension of 
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the underlying closest point problem. 

In this paper we develop a new paradigm for the design of high perfor- 
mance, low encoding and decoding complexity, unitary space-time codes. 
Similar to the full-diversity codes G m>r , S mjS and products of cyclic groups, 
our proposed constellations also use diagonal matrices as the kernel for fast 
decoding purpose. However, in sharp contrast to those existing codes which 
are parameterized by special integers, our constellations are defined by real- 
valued parameters and are not restricted to have full diversity or group 
structure. Consequently, unitary space-time code with our proposed struc- 
ture exists for any combination of antennas and constellation size. 

We define a code performance index that describes the bit error rate 
as a function of the signal to noise ratio. The index is simple to evaluate 
yet highly accurate in the normal signal to noise ratio (SNR) region. As a 
result, it is possible to bring all the power of non- linear programming into 
the code design. We have developed a complete gradient descent algorithm 
to design constellations that are optimal with respect to the bit error rate. It 
should be noted that the idea of code design by gradient-based optimization 
for non-coherent MIMO channels was proposed before in [1] and [16]. A 
systematic design of unitary constellation based on random search has been 
proposed in [19] . Our approach differs from the previous works in the design 
criterion, the structure of signal constellations, and the decoding method. 
We attempt to apply gradient descent techniques to directly minimize the 
bit error rate over signal constellations which allow for efficient decoding 
algorithms. 

Exploiting the special structure of our proposed constellations, the de- 
coding problem is reduced to one-dimensional closest point problems which 
can be efficiently solved in parallel. Based on that strategy, we have de- 
veloped parallel sphere decoder algorithms which can also be applied to 
improve decoding efficiency of existing codes. 

Based on the new structure and using the optimal design techniques, we 
have obtained constellations which significantly outperform existing ones. 
For example, with spectral efficiency R = 6 bits per channel use, we have 
found a constellation which improves upon orthogonal design by about 10 
dB at block error rate 6 x 10~ 2 when using two transmitter and receiver 
antennas. With the same configuration, the corresponding improvement 
upon Caley differential code is about 9 dB. 

In the rest of this section we establish the notation and describe the 
channel model. Section [2] introduces the structure of the new constellations 
and develops the optimization procedure for their design. Specifically, we 
introduce the performance index that converts constellation design into a 
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minimization problem amenable to steepest descent techniques and derive 
simple expressions for the computation of its gradient. Section [3] develops a 
parallel sphere decoder algorithm that can also be applied to improve exist- 
ing codes. Section U] presents results of the performed simulations. Section 
[5] summarizes our findings. Proofs and constellation data are provided in 
the Appendices. 

1.1 Notation 

Throughout this paper, we use the following notations. 
K — real number field; 
C — complex number field; 
Z — integer set; 
[.J — floor function; 
[".] — ceiling function; 
|_af| — the integer closest to x\ 

mod*(x) — symmetric modulus operation such that mod* (2;) has range 

r x x\. 

I 2 ' 2 /' 

arg(.) — phase angle operator taking values in [—it, tt); 
det(.) — determinant function; 
tr(.) — trace function; 

diag([xi, • • • ,x n ]) — diagonal matrix with x p at the p-th row and the 
p-th column; 

\\X\ \ — Euclidean norm of vector X; 

\\X\\p — Frobenius norm of matrix X; 

[X]pg — entry of X at the p-th row and q-th column; 

— real part of X; 
Q(X) — imaginary part of X; 
X 1 — transpose of X; 
X' — conjugate transpose of X; 

abs(X) — the matrix obtained by replacing each entry of X with its 
modulus; 

CAA(0, 1) — complex random variable with zero mean and variance one. 
Vg(x) — gradient of function g(x). 

1.2 Channel Model 

Consider a communication link with M transmitter and N receiver antennas 
operating in a Rayleigh flat-fading channel, which can be described by the 
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following channel model [20j 



X T = ^pS T H T + W T 

where r is the index of time frame, H T G ^MxN - ls channel matrix 
with CJ\f(0, 1) entries and is unknown to the receiver and the transmitter, 
S T G (QMxM j g ^ e transmitted signal, X T G (C MxN is the received signal, 
W T G C MxN is Gaussian noise with CAf(0, 1) entries, and p is the expected 
SNR at each receiver antenna. It should be noted that the channel matrix 
H T has been normalized so that the SNR is not dependent on the number 
of transmitter antennas. It is assumed that the channel matrix is approxi- 
mately constant within two consecutive time frames, i.e., H T ~ iJ T _i. How- 
ever, for the r-th and the t-th time frames that are not consecutive, H T 
and H L are mutually independent and thus their realizations can be signif- 
icantly different. The transmitted signals are determined by the following 
fundamental differential transmitter equations [20] 

So = hixM, S T = V T S T -i, t = 1, 2, • • • 

where V T G <C MxM is a unitary matrix picked from signal constellation V. 
It is shown in [20] \2l\ [22] that the maximum-likelihood (ML) detection is to 
minimize 

\\X T — V^X T _i||| 

among all possible G V. The Chernoff bound of pair-wise probability of 
mistaking Vi for V«i or vice versa is given by [21] 

1 M 

m, v e <)=- n 

m=l 

where a m is the m-th singular value of Vg — Vf> . 

2 A New Constellation Design Approach 

The new code design paradigm that we propose uses diagonal matrices, as 
in [39], to simplify the decoding process. Our approach is similar to [1] 
and [16] in the spirit of relaxing the code structures from strict structures 
such as orthogonal or diagonal structure, parameterizing the codes, and 
employing the powerful gradient-based optimization to find the best codes. 
A significant new feature is the ability to formulate the design as a non- 
linear programming problem that directly minimizes the bit error rate. In 



1 + 



P < 



4(1 + 2p)_ 



(1) 
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the following, we begin our presentation by introducing new codes that are 
functions of real- valued variables and do not require full diversity. Then we 
introduce the cost function and derive expressions for its gradient that are 
used in a steepest descent design algorithm. 



2.1 Constellation Structure 

In this section, we introduce a new class of unitary space-time codes which 
can be efficiently encoded and decoded. Similar to the full-diversity codes 
such as FPF code G mtT , non-group code S m>s and products of cyclic groups 
[39], our proposed constellation also involves diagonal matrices for fast de- 
coding purpose. However, in sharp contrast to those full-diversity codes 
which are parameterized by particular integers, our proposed constellations 
are determined by continuous parameters and are not restricted to have full 
diversity or group structure. Consequently, unitary space-time code with our 
proposed structure exists for any combination of antennas and constellation 
size. 

Let b > be an integer and let L be a power of 2. We construct a 
constellation V with Jzf = 2 b L signal matrices as follows. 



For q = 0,1, ••• , 2 b - 1, define 



A q = diag 



exp 



i2vrA, 



q,l 



L 



exp 



i2ir\ 



q,M 



L 



where A g> i = 1 and \ q , m G [0, L), m = 2, • ■ ■ , M are real-valued parameters. 
Let Ao = Bq = I and A q , B q , q = 1, • • • , 2 b — 1 be unitary matrices. Then 
the constellation is given by 



V = {A q A q B q 



0, 1, 



L-l; g = 0,l, - 2 6 -l}. 



We note that the constellation design problem is to find \ q , m and A q , B q 
so that the bit error rate is minimized. We shall show that this problem can 
be solved efficiently. 

For the purpose of comparing our constellations with existing ones, we 
note that the spectral efficiency of our proposed constellation is 



R 



log 2 (i?) _ b + \og 2 (L) 



M 



M 



It should be noted that, for the special case 6 = 0, the signal constellation 
reduces to 

{A £ | ( = 0, L-l} 
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where 



A = diag 



/«2vrAi 
exp I 



L 



exp 



L 



with Ai = 1 and continuous parameters A m G [0, L), m = 2, • • • , M. We 
refer to such constellation as a continuous diagonal code. Obviously, it is a 
generalization of cyclic group code. 

In general, with fixed constellation size Jzf ', the performance may be 
significantly improved by increasing the number of blocks (i.e., ^). Inter- 
estingly, we shall show that the decoding complexity increases slightly with 
respect to the number of blocks. This property can be attributed to our 
parallel sphere decoder algorithms, discussed in Section 



2.2 Design Performance Index 

Efficient constellation design is a challenging task due to the large number 
of parameters. In addition to the structure of the constellations, the design 
criterion is also critical for the achievable bit error rate performance. One of 
the widely used criterion is to use the diversity product as the performance 
measure of a constellation. The design objective is to maximize the diversity 
product over a class of constellations that have full diversity (see, e.g., [20] 
|27j . |39j and the references therein). The drawbacks of the conventional 
design criterion are the following: First, the diversity product is essentially 
a worst-case measure. In many situations, the overall performance of a 
constellation is not governed by the behavior of extreme signal matrices. As 
can be seen from our experimental results in Section it is not uncommon 
to have constellations with zero diversity product significantly outperforming 
constellations with the largest diversity product previously known. Second, 
the measure diversity product is derived by an asymptotic argument. The 
idea is that, as the SNR tends to infinity, the Chernoff bound of the pair- 
wise error probability is dominated by the determinant of the difference of 
the pair of unitary matrices. Such asymptotic argument is not flawless. It 
is not clear how large the value of SNR can be approximated as infinity so 
that no significant inaccuracy will be introduced in the evaluation of the 
block (or bit) error rate. 

In light of the limitations of the worst-case and asymptotic design crite- 
rion, we have established a new design criterion which incorporates the bits 
assignment in the optimization of constellations. Instead of using a worst- 
case criterion such as diversity product |20j, we introduce a performance 
index that measures directly the bit error rate as a function of the signal 
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to noise ratio. The index is analytically tractable and possesses simple an- 
alytical expressions for its gradient. Motivated by the fact that, for large 
constellation size, the bit error rate may not be well governed by the block 
error rate, we shall also incorporate the bit assignments in the process of 
constellation optimization. In particular we propose the cost function 



where Put(p) is the union bound of bit error probability and \pi,po\ is the 
interval of SNR of practical interests. We shall show that this cost function 
can be well approximated by a very simple analytical expression. 

From numerous simulation results published in the literature, we notice 
that, on a log scale, the bit error rate is an almost linear function of the 
SNR. Such phenomenon can be illustrated by making use of the Chernoff 
bound ([T]). For large SNR, the Chernoff bound P(Ve, V^) of pair-wise error 
probability can be approximated by 



Since such approximation is tight for most combinations (£,£') and is 
assumed to be equally likely for all £, the union bound of the bit error rate 
is well approximated by 



where d n (£, £') denotes the Hamming distance of bits assigned to and Vg>. 
Applying logarithm operation gives 




pi 





logio^bitO) 
MN 



101og 10 (p) 



10 
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Figure 1: The area of trapezoid ABCD, or equivalently — C(V), reflects the 
quality of constellation V. 

Figure Q] displays the actual cost function and the proposed approximation. 
For completeness we mention that the block error rate admits a similar 
approximation . 

Due to the excellent linearity of the performance curve in the logarithm 
scale, the cost function can be well approximated by 

C(V) = [log 10 P bit (p 2 ) + log 10 P bit ( Pl )] 

X [ lo glo(P2) -logio(Pl)]- 

We propose to design constellations that minimize the index C(V). 

In practice, we can choose p\ and p2 based on the performance of the best 
cyclic group codes previously known. More specifically, pi and p2 can be se- 
lected so that two typical levels of bit error rate are respectively guaranteed. 
For example, we can find p\ and p2 such that 

i ogl0 p hit ( Pl ) = icr 3 , i ogl0 PbM = io" 5 

by a bisection method for an existing cyclic group code. When p\ and p2 
have been found, the criterion measure CO^) i s SNR independent. Most im- 
portantly, the gradient of C(V) with respect to the constellation parameters 
can be computed efficiently and thus allows for a gradient descent method 
for constellation design. The optimization technique is described in the next 
section. 
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2.3 Constellation Optimization 



In this section, we perform a global optimization to find unitary constella- 
tions of good performance. Our strategy is to first choose the bit assignment 
and then search the code parameters to minimize the bit error rate. The 
advantage of this strategy is that the objective function £(V) is a differ- 
entiate function and is amenable for gradient-based optimization. On the 
other hand, if we first search the good code matrices and then try to find the 
best bit assignment, we need to solve a combinatorial optimization problem. 
In general, such combinatorial problem is not tractable for gradient-based 
optimization techniques because the objective function is not continuous. 
The only method for solving such combinatorial problem is the exhaustive 
random search. Unfortunately, for large constellations, the searching can be 
extremely inefficient. 



2.3.1 Parameterization of Unitary Matrix 



In order to develop a gradient-based method for the minimization of the per- 
formance measure C(V), the first step is to choose a suitable parameterization 
for unitary matrices. The application of unitary matrices parameterization 
|33j in signal constellation design has been pioneered by pQ. We adopt such 
idea of using parameterized unitary code matrices. In general, a M x M 
unitary matrix U can be determined by a set of M 2 parameters O defined 
as follows: 



<PpM G 



Vpq e 



Ok E 

9m £ 



7T 7T 

~2' 2 

-7T,7T 
7T 7T 

~2' 2. 

7T 7T" 

2' 2. 
-vr,vr) 



1 < P < q < M - 
1 < p < M - 1; 
1 < p < q < M; 



1; 



1, 



M- 1; 



More specifically, let U p,q ( 
unitary matrix such that 



' 1, 
cos(0 p? 
— sin ' 
sin' ' 
0, 



'pq) c 



denote a (M — p + l)-dimensional 



if j = k and j ^ {1, g — p + 1} 
if j = k and j € {1, ? — p + 1} 
if j = 1 and k = q — p + 1 
if = 1 and j = g — p + 1 
otherwise 
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and let 

f r = JJ^r+l jjr,r+2 _ _ _ jjr,M 

then, any unitary matrix U(Q) can be represented as 

u = ^m-i F i 

where 

& 1 



exp(i#Af) 



and 

for k = l,--.,M-2. 



exp(i/9 M -fc-i) 

r M_fc 



2.3.2 Gradient Method 

Here we develop explicit expressions for the gradient of the performance 
measure C(V). For the computation of £(V) we need to evaluate the union 
bound of the bit error rate, which depends on the bit assignment. With 
regard to the bit assignment, our intuition is that, if we first search the 
good code matrices and then try to find the best bit pattern - code matrix 
assignment, we need to cope with a combinatorial optimization problem. 
Such combinatorial problem is generally not tractable for gradient-based 
optimization techniques because the objective function is not differ entiable. 
The available method for solving such combinatorial problem will be ran- 
dom search. Unfortunately, for large constellations, the searching can be 
extremely difficult. In our design, we shall first fix the bit assignment and 
then search the code parameters to minimize the bit error rate. In this 
way, the objective function is a differentiable function and is amenable for 
gradient-based optimization techniques. 

For simplicity, we use the binary-to-decimal conversion mapping scheme. 
In such a scheme, a block of b + log 2 (L) bits is mapped into a signal matrix 
A q k l q B q such that the first b bits are the binary representation of the block 
index q and the remaining bits are the binary representation of the diagonal 
index t. Let d^{p,q,l,t ) denote the Hamming distance between the bits 

respectively assigned to signal matrices A p A p B p and A q A q B q . The union 
bound of the bit error probability is then given by 



bit 



2^L[6 + log 2 (L)] 



V + V 



(2) 
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where 

2^ — 1 L—2 L—l 

and 

V 

2 b -2 2 b -l L-1 L-1 

= E E EE^^^^^^^'Vi'^) 

It can be seen that, using ([2]) to compute P^w,, the number of pair- wise error 
probabilities to be evaluated is 

2 b - 1 L(L- 1) + 2 fc - 1 (2 b - 1)L 2 . 

The problem can still be solved using steepest descent method for small 
to moderate L. However, the computational complexity may be high for 
large L. For proof of concept, we focus here on the special case of A q = 
A, A q = I for q = 0, • • • , 2 b — 1, where, exploiting the special structure of 
the constellation, the number of pair-wise error probabilities to be computed 
can be substantially reduced to 

(L-l) + 2 b - 1 (2 fe -l)(2L-l). 

For this case, we have 



Theorem 1 Let d (p,q) denote the Hamming distance between the binary 
representation of integers p and q. Define 

L-k-l 

w (k)= d R (£ + k,£), k = 0,l,--- ,L-1. 

e=o 

Then 

2 

bit "L[6 + log 2 (L)] 

where 

2^—2 2^ — 1 L—l 

P'^EE E H\k\)+d n (p,q)]P(B p ,A k B q ). 

p=0 q=p+l k=-L+l 



L-1 
k=l 
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See Appendix [A] for a proof. It should be noted that, to reduce compu- 
tation, w(k) can be pre-computed and saved as a lookup table. 

In order to use gradient descent method to minimize C(V), we need to 
find the fastest descent direction at every step of searching. Following the 
procedure in [Ij, we update B p as B p U{&). In the sequel, we shall show 
that the computation of the gradient of performance measure C(V) reduces 
to the computation of: (i) the partial derivatives of functions of the form 
P(U(Q),$) with respect to at = (i.e., all elements of 6 are zero); 
(ii) the partial derivatives of functions of the form P(A £ , <3?) with respect to 
A = diag([e 27rai / L ,---,e 27ra «/ L ]) at A = I (i.e., X m = 0, m = 1,---,M). 

We have derived surprisingly simple formulas for computing pair-wise 
error probabilities and the related partial derivatives. 



Theorem 2 Let U(@) be unitary matrix parameterized by B. Let & be a 
unitary matrix. Let 

a = 4(1+ 9 2P) , Q = [(a + 2)1 - $ - $t]-i$ 



and 



Then 



A = diag([e 2 ™ Al/i , • • • , e 2 ™Wi]). 



P(/,$) = 
dP(U, $) 



o 



MN 



2(det[(a + 2)J-*-*t]) 



N ' 



Uip pq 

dP(U,$) 



de k 

dP(U, $) 



dVpq 

dP(A e , $) 



<9A r , 



e=o 



e=o 



e=o 



A=i 



2NP(I,$) U([Q] qp -[Q] pg ), 
2NP(I,<S>) 3([Q] fcfc ), 



P(J,*) 9f([Q] mm ). 



(3) 
(4) 
(5) 
(6) 
(7) 



See Appendix [B] for a proof. At the first glance, it is not clear how 
Theorem [2] can be applied to the optimization of code matrices. From the 
expression of our performance metric C(V), it can be seen that it suffices 
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to compute the gradients of -Pbit(pi) and Pbitipz) with respect to code pa- 
rameters. Prom ([2]), we can see that, since the bits assignment is fixed, it 

suffices to compute the gradient of P (A^, A e p ) and P(A p A p B p , A q A e q B q ) 
with respect to code parameters for all combinations of p and q. Note that 
the first quantity can be viewed as a special case of the second. Hence, we 
focus on the second quantity P(A p A p B p , A q A q B q ). We first consider how 
to update the matrix B p . In order to apply the gradient descent method 
to minimize the performance metric, we need the partial derivatives of the 
function P(A p A p B p , A q A q B q ) with respect to the parameters of B p . It can 
be seen from the complexity of the function P(., .) and the parameterization 
of B p that the direct computation of the partial derivatives can be extremely 
difficult. Observing that, at every step, B p is to be updated as B p which is 
also a unitary matrix. Hence, there must be an unitary matrix U(Q) such 
that B p = B p U(Q). This means that we can update the unitary matrices in 
a multiplicative way. As mentioned earlier, this method of updating unitary 
matrices was proposed in [lj. In the same sprit with that of the conventional 

steepest-descent minimization, to make P(A p A p B p , A q A q B q ) descent in a 
fastest way as B p is varying to a new matrix, we can choose U{Q) based on 

the partial derivatives of P(A p ApB p U(Q), A q A e q B q ) with respect to © at 
= 0. The computation of the derivatives can be accomplished by applying 
Theorem [2] and the following fact: 

P(., .) is invariant under unitary transforms. That is, for any unitary 
matrices X and Y, P(U L XU R , Y) = P(X, U ] L YU R ) for any unitary ma- 
trices Ul and U R . 

To prove this fact, we can use equation (|28p . which is shown in Appendix 
E By (1281), 

P(U L XU R ,Y) 



2 (det[a/ + (U L XU R - Y)(U L XU R - Y)t]) J 

Observing that 

(U L XU R -Y)(U L XU R -Y? 
= U L (X -UlYU R )U R U R (X -UlYU^Ul 
= U L {X -UlYU R )(X -UlYU R m 

we have 

al + {U L XU R - Y)(U L XU R - Y) f 
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u, 



aI + {X-UlYUl){X-UlYUl 



tvrrht 



U 



Hence 



det al + (U L XU R - Y)(U L XU R - Y)^ 
det(U L ul) det \al + (X - ulYU ] R ){X - U ] L YU R ) ] 

det M ( Y — TTWTTT \ ( Y — TT^VTT^ \t 



and 



aI+(X-UlYU R ){X-UlYU R ) 
P(U L XU R ,Y) 



a 



MN 



N 



2 {det al + {X - ulYU ] R ){X - ulYU^ }" 

= P(X, u{YUi). 

This proves the invariant property. An immediate result from such property 
is t ; 

P U p A l p B p , A q A q B g ) = P (i, (A p A e p B p )^A q A q B q 



which implies that we can let $ = (A p A e p B p )^ A q A e q B q and apply © to 

compute P(A p A e p B p , A q A e q B q ). 

Making use of such property, we have 

P (A p A p B p TJ{@), A q A q B q ^ 

= P(U(&), {A p AlB p fA q A\ B q ) . 

If we identify (A p A p B p )^ A q A q B q as we have 

P (A p A e p B p U(@), A q A e q B q ^ =P(U(Q),Q). 

Hence, the partial derivatives of P(A p A p B p U(@), A q A q B q ) can be com- 
puted by applying Theorem [21 

Similarly, we can update A p as A p U (0) and compute the partial deriva- 
tives of 



P {A p U(&)A p B p , A q A q B q 
P(U(@), AlA q A l q B q (A £ p B p ) 
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with respect to at = 0. The calculation can be done by identifying 
AtpAqA^ B q (A e p B p )^ as <3? and applying Theorem [2j 

In the same spirit, we can update A p as (A p A)^ and compute the partial 
derivatives of 

P (a p (A p A) £ B p , A q A(B q ) 

= p(A*. (A p A^A q A( B q Bl) 

with respect to A at A = 7 (i.e., A m = 0, m = 1, ■ ■ ■ , M). This can be 
accomplished by letting 

$ = (ApA^A^B.Bl 

and invoking Theorem [2l 

Finally, because of symmetry, we have 

P (ApI^Bp, A q A e q B q ^ = P [AqA^B^ A p A e p B p ) . 

Hence, we can update matrices A q , A q and B q and compute the correspond- 
ing partial derivatives by the similar method as that of matrices A p , A p and 
B p . 

In the gradient-based optimization, we used the standard steepest gradi- 
ent descent method in [56] , with some minor modification to adapt to param- 
eter bounds. In the course of experimenting with the new design paradigm, 
we have observed that it is beneficial to apply the following searching strat- 
egy 

STEP (a). Find the best constellation of diagonal structure {A^ | < £ < 
L — 1}. This can be done as follows. First, perform random search 
to find n good initial values of A. Second, for each initial value of 
A, perform gradient-based optimization. Finally, choose the best one 
among the n outcomes. 

STEP (b). Let A be found in the first step. Find the best constellation 
of special structure {A £ B q \ £ = 0, • ■ ■ , L — 1; q = 0, • • • , 2 b - 1} by 
employing gradient descent search over B q while A is fixed. Here the 
initial value of B q can be randomly chosen. 

STEP (c). Using the code found at the second step as starting point, search 
A q , B q and A q by gradient descent method. Here the initial value of 
A q can be randomly chosen. 
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For Steps (a)-(c) in the above strategy, we have adopted the same choice 
of the step size as that of the algorithm of |56j . 

3 Fast Decoding 

Now that we have efficient constellation design tools, we focus on the all im- 
portant decoding problem. In this section, we develop efficient algorithms 
for decoding our proposed new codes. Interestingly, such decoding algo- 
rithms are also applicable to existing codes. For ease of presentation, we 
first focus on the case that the constellation has only one block (i.e., 6 = 0) 
and the receiver is equipped with only one antenna (i.e., N = 1). Subse- 
quently, we discuss the decoding for the general cases of multiple blocks and 
multiple receiver antennas (i.e., b > and N > 1). 

When 6 = 0, the constellation reduces to the continuous diagonal code. 
The signal constellation consists of L diagonal matrices = A^, I = 
0, 1, ■ • • , L — 1, For N = 1, the received signal X T 6 C Mxl is a complex 
vector. As described in [5], the ML decoding problem can be reformulated 
as a problem of minimizing a Euclidean norm as follows: 

-ML 

= arg min \\X T — V^A T _i||p 

M 

k, arg mm ^ [{C m \ m i - C m Lp m ) mod*C m L] 2 (8) 

m=l 

where 



C m — \f\[X T ]ml [A^--l] m l|, lf m — arg 

It has been demonstrated in [5] that the approximation in (J8j) is extremely 
accurate. Therefore, the decoding problem for the case 6 = 0, N = 1 has 
been transformed into the minimization problem of finding 

M 

F ucl = arg min £ [(C m X m t - C m <p m ) mod*C m L] 2 . (9) 

m=l 

In the following sub-sections we develop an efficient algorithm for this min- 
imization problem. 



( [^r]ml \ L 

\\X T -i] m iJ 2ir 
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3.1 Lattice Decoding Algorithms 

In the special case that A m , m = 1,---,M are integers, the continuous 
diagonal code reduces to the cyclic group code. In order to decode the 
cyclic group code, Clarkson et al., [S] developed an approximate solution for 
the minimization problem Q. The key steps are as follows: 

1. Reformulate minimization problem ([9]) as a lattice closest point prob- 
lem 

arg min \\yG - £\\ (10) 

j/ez lxM 

where £ = [£i, ■ • • , £m] with £ m = C m (p m , m = 1, • • • , M and G is a 
M X M generator matrix such that 

{C q X q for p = 1 and 1 < q < M, 
C q L for 1< p = q < M, 
else. 

2. Apply the "LLL" lattice algorithm [26] to find an approximate solution 
y = [yi, • • • , tjm] for (|10p . An estimate for J 61101 is taken as y\ mod L. 



While the "LLL" lattice algorithm approximately solves (110p . existing 
sphere decoder algorithms (see, e.g., [2], [SI [8], [9], [38] and the references 
therein) can provide an exact solution for (llOh and hence improve decoding 
accuracy. The sphere decoder takes advantage of the lattice structure of the 
received signals and proceeds as follows: (i) It searches the closest lattice 
points to the received signal which are enclosed in a sphere centered at the 
received signal; (ii) each time a lattice point of a smaller norm is found, it 
reduces the sphere radius accordingly and restart the search until an empty 
sphere is reached. The choice of initial radius depends on the lattice con- 
sidered, as well as on the additive noise level. At the heart of the sphere 
decoder algorithm is the subroutine which serves the purposes of: (a) deter- 
mining whether a sphere \ \y G — £|| 2 < 7 2 with fixed radius 7 > is empty; 
(b) detecting a vector in it otherwise. A Cholesky factorization is performed 
to find an upper triangular matrix D so that D T D = GG r , from which the 
boundary conditions of the sphere can be derived as 

/ $ k I $ k 
z k ~ \ ~. m k < Vk < \h zuk + z k , k = M, M - 1, • • • , 1 (11) 



tkk V tkk 



where 



Al 



[zi, • ■ ■ , &m] = £G l ,w k = tkjiVj ~ Zj), k =!,■■■, M 

j=k+i 



18 



M = 4 




Nodes for 
valuses of Y 



Nodes for 
values of Y 

2 



Figure 2: A tree representation of values of coordinates to be investigated 
for a fixed i/m- 



•&m = 7 2 , tffc-i = &k ~ tkk (Vk ~ z k + w k ) 2 , fc = 2, • • • ,M 
with t kk = [D]\ k , k = 1, • • • , M and t kj = {g}^, 1 < k < j < M (see, 



e -g-> [3 HE] for details). Clearly, the boundary of y k depends on values of 
Uj, j = k + 1, • • • , M. If the set of feasible values of i/m, denoted by Im, is 
not empty, then for each member yu of Im the values of other coordinates 
needed to be evaluated in the sphere decoder algorithm can be represented 
as the nodes of a tree starting from yu- The following Figure [2] depicts this 
tree structure. In the tree, the children nodes are generated from the parent 
nodes in accordance with the boundary equation A path of length M 

(i.e., consisting of M nodes) corresponds to a vector located in the sphere. 
When Im has multiple members, the task of the core subroutine is to search 
among the multiple trees to determine whether there is a path of length M 
and identify one if there exists. It should be noted that, in sphere decoding, 
most of the computational efforts are devoted to the evaluation of paths of 
length less than M. 



and 
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3.2 Removing the Curse of Dimensionality 

In the general case that A m , m = 1, • ■ • , M are continuous parameters, the 
minimization problem in ([9]) lacks the lattice structure. Hence, existing 
sphere decoder algorithms and "LLL" lattice algorithm are not applicable. 
Moreover, even for the special case of cyclic group code, the lattice decoding 
algorithms described in the last subsection aim to solve a closest point prob- 
lem of dimension M (the dimension will be expanded to MN when using N 
receiver antennas). The computational complexity may be too high when 
the number of transmitter antennas M (or the number of receiver antennas 
N) is large. Therefore, it is crucial to reduce the dimension of the underlying 
closest point problem by further exploiting the diagonal structure of signal 
constellation. We achieve the reduction and improve efficiency with a new 
decoding algorithm, applicable to the general case that X m , m = 1, •■■ ,M 
are continuous parameters. As a critical step to reduce decoding complexity, 
we will show next that the dimension of the related closest point problem 
can be reduced to one. 



Theorem 3 Define 



S 



{(yi,'" ,vm) I yi e z 

and - ~ + (fx < yi < ~ + iff, 




for m = 2 



•••,M}. 



Suppose that there exists an unique I G {0, 1, ■ 



L — 1} such that 



M 



[(C m X m £ — C m ip m ) mod*C m L] 2 



m=l 



M 




m=l 



Then 




m=l 
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where y\ is first entry of 

V= [yi,-",VM] = arg min||yG - ^|| 2 . 

yes 



See Appendix ICl for a proof. 

It can be seen from Theorem [3] that y q , for q = 2, • • • , M, is uniquely 
determined by y\. Hence, finding 

arg min| \yG - £| | 2 (12) 

yes 

is essentially a one-dimensional closest point problem. 

Next, by exploiting the special structure of the constellation, we derive 
extremely simple boundary conditions for the sphere {y G S | | \yG — £| | 2 < 7 2 } • 



Theorem 4 Let y = [yi, ■ ■ ■ , yM] € S. Define 

Mi = [Ci (yi - <pi)? 

and 

= Mm-i + [C m (Ly m + A m yi — <p m )] 2 

for m = 2, • • • , M. Then \\yG — ^|| 2 < 7 2 i/ and onZy «/ yi is an integer 
satisfying 

7 7 L L 

v>i - -j~r < yi < ^ + — g +vi ^ yi < 2 +V91 ^ 13 ^ 

and 

^ < 7 2 for m = 2,---,M. (14) 



See Appendix [D] for a proof. 

It can be seen that the conditions in (|13p determine an interval X\ of 
feasible values for y\. For each value of y% G Zi, we only need to evaluate 
the simple conditions in (|14p . This is in sharp contrast to the search over a 
tree structure described above in the context of sphere decoder. 

It should be noted that the sphere decoding algorithm is originally de- 
vised to find closest lattice points. In general, our decoding problem is not 
a problem of searching closest lattice points. However, we can still use the 
sphere decoding algorithm because the enumeration of interior points of a 
sphere can be efficiently done as the case of a lattice problem. Moreover, 
Theorem [5] indicates that the "sphere" can actually be reduced to an "in- 
terval" of one dimension. 
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3.3 Simplified Sphere Decoder 



Using the reduction of dimensionality described in the last subsection, we 
now develop a new decoding algorithm which also applies to continuous diag- 
onal codes, FPF codes G mtr , non-group codes S mtS and products of groups. 
To further enhance efficiency, we adopt the "zigzag" searching strategy orig- 
inated in [37] and the idea proposed in [5] for avoiding repeated computa- 
tions. 

Obviously, the search for y\ can significantly affect the efficiency. Let y 
denote the vector corresponding to the transmitted signal. Intuitively, for 
moderate and high SNR, it is more likely for the received signal £ to be 
closer to y G. Since \y\ — tp\\ < \\yG — £||, we should first investigate y\ 
which is closer to ip\ for a better chance of detecting y. Therefore, we shall 
investigate y± in the following sequence, 

k 



M + (-i) 



0,1,2, 



(15) 



That is, the investigation is started from \jp{] and proceeded in a "zigzag' 
order in the outward directions (see, e.g., [2], [37J). Note that condition (fT3 
implies 



mm 



_7_ 
Ci 



< yi - [<pi] < 



mm 



1_ 
Ci 



Hence, it suffices to investigate 



Vl = + (-l) k 



0,1, 



mm 



2. L 

Ci 2 



It is also important to avoid repeated investigation of y\. When a value 
of y\ is found to satisfy condition (fT4|) , the radius 7 is reduced as ■ s [JTm and 
the interval confining y\ is consequently shrunk. In this way, the range of y\ 
needed to be investigated is squeezed from outside. To improve efficiency, 
we use the idea of [8] to ensure that the range of y\ needed to be investigated 
is also squeezed from inside. The idea is based on the following observation: 

For a given radius 7, if a value of y\ violates the boundary conditions, 
then the same value ofy\ also violates the corresponding boundary conditions 
after 7 is reduced. 

Therefore, we can keep a record for the values of y\ which have been 
investigated in order to avoid repeated computation. For this purpose, the 
index variable k in (|15p can be used as an indicator for the range of values 
investigated. 

In summary, the decoding algorithm is presented as follows. 
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STEP 1, Input 7 <— 7; mt where initial radius 7^ is chosen based on noise 
level. Let k <— and y~\ <— \}p\\ • 



STEP 2. Let k max 



1,1111 I cv I 



STEP 3. If k < k max , let yi «- [pi] + (-l) fc LIJ and Jfc <- fc + l. Otherwise, 
let 7 <- §7, fe «- and go to STEP 2. 

STEP 4. If condition CO} is violated, go to STEP 3. Otherwise, let yi <- 
2/1, 7 <- v 7 ^- 

STEP 5. Using 7, yi and k as input, call subroutine CLOSEST-POINT to 



find yi. Then z? uA is calculated as z? nd = yi 
as the estimate and stop. 



L. Return J 6 " 01 



The subroutine CLOSEST-POINT is presented as follows. 

Function: CLOSEST-POINT 
STEP 1. Input 7, yi and k. Let yi <— yi. 
STEP 2. Let /c max <- 2 min ■§ 

STEP 3. If k < k max , let Vl <- [<p{\ + {-l) k [|J and A; <- k + 1. Otherwise, 
go to STEP 5. 



STEP 4. If condition (jT4j) is satisfied, then let yi <— yi, 7 <— y^UM an d S° 
to STEP 2. Otherwise, go to STEP 3. 

STEP 5. Return yi and stop. 

It can be seen from STEP 3 that the index k has served the purpose of 
avoiding repeated investigations. Once a nonempty sphere is detected, no 
value of yi is investigated more than once among the subsequent smaller 
spheres. 

It should be also noted that, compared to conventional sphere decoder 
algorithms, many computationally expensive steps have been avoided in 
our algorithm. For examples, the Cholesky factorization of GG J and the 
computation of are not needed in our algorithms. 
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3.4 Sphere Decoding - The General Case 

We now discuss the decoding problem for the general case of multiple re- 
ceiver antennas and multiple block constellation (i.e., N > 1, b > 0). Since 
the ML-decoding is computationally difficult, our goal is to develop a sub- 
optimal decoding method with low decoding complexity. Note that the ML 
decoding finds 

arg min||X T - AgAiBgX^Wl, (16) 

£,q 

which can be done by obtaining 



arg mm | \X T - A q A q B q X T ^ | || 



(17) 



for q = 0, 1, • • • , 2 b — 1 and seeking the tuple (q, £) minimizing the Probenius 
norm. This method is of sequential nature and has been used in [39] for 
decoding FPF code G miT , non-group code S mjS and products of groups with 
the "LLL" lattice algorithm sequentially applied to solve (I17p . 

In the general case the underlying closest point problem is of dimension 
MN and sequential approaches are inefficient. We use the simplified sphere 
decoder algorithm developed in the previous subsection and transform the 
decoding problem into 2 b one-dimensional closest point problems that can 
be solved in parallel. 

Since the Frobenius norm of a matrix is invariant under unitary trans- 
formations, we have 



\X T 



A q A q B q X T _i\\ F 



\\A\{X T - A q A l q B q X T ^ 
||4x r -AX^r-i|||. 



i£ r> 
-q--r ^q^q 

By a similar method as that of [5], we can show that 
\\A\X T - A q B q X T - X \\l 

N M 

= E E I [ A l X rUn ~ e^ X ^ L [B q X T ^] r 



n=l m=l 

I If + I 1| If 

N M 

-2^^C^co S ([(A 



x q,m 



mod*L]27r/L) 



n=l m=l 



N M 



\ x t\\f + ii^r-iiii - 2 E E c ™, n 



n=l m=l 
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N M 



+ E E C kn(KKm I ~ <p m ,n) mod*L]2vr/L) i 



n=l m=l 



An 



Att 2 



N M 



) mod*C mi „L] z 



where 



n=l m=l 



arg 



[A\X T ] 



L 

2^' 



and 



[-Sq^r— l]mn 

L 2 ||abs(4^r) - abs(£ 9 X r _i)||2 



4vr 2 

Define a MiV x MN matrix G 9 such that 



for A; = 1 and 1 < j < MN; 



\G q ] k3 = 



(18) 



for 1 < k = j < MN; 
else. 

Define a row vector ^ q = [£i, ■ ■ ■ , £mat] such that 

for fc = 1, • • • , MN. Define Vfc = ^ (fe _ L ^ JM: [>^\ + i) and A = A fe [tlj+i) 
for fe = 1,---,MJV. Define 

= {(yi,---,yMJv) 1 2/1 eZ 

and - | < yi < 



Vk 



for fc = 2,---,MJV}. 
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Then, by Theorem [3j we have 

N M 



min 'y ^ [(C , m . n A l j jm £ — C rn ,n l Pm,n) mod* C mjn L] z 



n=l m=l 

= min||yG 9 -£ 9 || 2 . (19) 
It follows from (USD and ([191) that 

^g-Dq^T-lllF 



4tt 2 

mm (||^-^|| 2 +A 9 ), 



leading to 



min||X r - AqKiBgXr^xWp 

£,q 

Air 2 

« min min (| |yG 9 - £ 9 | | 2 + A,) . 

Hence, by Theorem O the maximum likelihood decoder can be well approx- 
imated by 

L 



(q, £) = ( q, m 

where y% is the first entry of y = [yi , ■ • • , Vmn] such that 



(g, y) = arg min mm (| \yG q — £ q \ | 2 + A q ) . 

The above analysis shows that the efficiency of the decoding problem (1161) 
can be enhanced by sequentially applying the simplified sphere decoder al- 
gorithms developed in the last subsection. In the next sub-section we shall 
improve the efficiency even further by developing a parallel search strategy. 



3.5 Parallel Sphere Decoding 

The sequential sphere decoder algorithm introduced in the last subsection 
involves 2 b independent sphere decoding processes. When the constellation 
consists of many blocks (i.e., large 2 fe ), the sequential decoding may be too 
time consuming, but given the independence of each sphere decoding all 
searches can be executed in parallel. Specifically, since it has been shown 
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in the previous subsection that the ML decoding problem (|16p can be refor- 
mulated as the sub-optimal decoding problem 



arg min min (I \yG q — f 9 | I 2 + A ) , 

q=0,l,-,2 b -l yesi x 

we can apply in parallel the simplified sphere decoder algorithm to investi- 
gate the following 2 b spheres: 

{y€S q | \\yG q -^ q \\ 2 < 7 2 }, q = 0, 1, • • • , 2 b - 1 

where 

7 2 = 7 2 -A g , g = 0,l,---,2 fc -l (20) 

with parameter 7 controlling the sizes of all spheres. The choice of the 
initial value of 7 is similar to choosing the initial radius of conventional 
sphere decoder. For a fixed value of 7, the 2 b spheres respectively determine 
2 b sets of feasible y\ values based on (fT3j) . Let the set for the g-th sphere be 
denoted as I q . We investigate the y± values of these sets in a round robin 
order. That is, the sets are visited in the following sequence 

T o T i _ T 2 b -i. T o T i _ T 2 b -l. 

X , x , • • • , x , x , x , • • • , x , 

Of course, any value of y\ will be eliminated from its corresponding set after 
evaluation. Once a value of y\ from set T q is found to guarantee (fT3|) and 



(fLlj) , 7 g is reduced as y/JiM- Subsequently, 7 is reduced as y 7| + A g and the 

radius of other spheres are decreased accordingly by (|2"0|) . When no value of 
yi from any set satisfies (fT3|) and ([TI]) . 7 will be increased and consequently 
all the spheres are enlarged based on (|20l) . All the spheres keep enlarging 
before detecting a value of y\ guaranteeing (|13|) and (|14p . Once the value of 
yi is found, all spheres begin to shrink. The shrinking process is very quick 
due to the parallel mechanism. This process is terminated when all these 
sets become empty. The solution of decoding problem (|16p is given as the 
tuple 

' m 



L 



L 



where y\ £ Z q and y\ is last value found to guarantee ([13]) and ([TI]) . It 
should be noted that, in this decoding process, only one CPU processor is 
needed. 
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4 Illustrative Examples 



In this paper, we only design constellations for the special structure that 
A q = A, A q = I for q = 0, • • • , 2 b — 1. The computational effort has been 
significantly reduced by applying Theorem [TJ Better codes (with lower bit 
error rate but equivalent decoding complexity) can be obtained if we allow 
general A q and A q . However, the searching time will be substantially in- 
creased if the constellation size is large. Even in this limited case, the new 
design paradigm generates unitary space-time constellations which signifi- 
cantly outperform existing ones. In the following, we show the simulation 
results of our codes as compared to existing codes. In comparison of the 
bit error rate performance, we have used the Gray code bit mapping for the 
orthogonal design and CD codes, and the binary-to-decimal conversion map- 
ping for our codes, cyclic group codes, FPF codes and product of groups. 
The data of our unitary space-time codes are reported in Appendix [Ej The 
details of orthogonal designs we used in our simulation is provided in Ap- 
pendix |Fl 

For the case of two transmit antennas and one receiver antenna, our 
computational experience indicates that it is hard to achieve significant per- 
formance improvement upon the orthogonal design proposed in [44J. By 
using nonconstant modulus constellations, the performance of [21] further 
improves upon that of [H] at the price of the complexity of estimating the 
channel power and signal power. However, when the number of transmit 
antennas is more than two or the number of receiver antennas is more than 
one, the differential detection scheme based on orthogonal designs subjects 
to significant performance loss. 

We compared the performance of our code with the differential detec- 
tion schemes using orthogonal designs in Figures 3-7. In general, our codes 
significantly outperform orthogonal designs at the price of relatively higher 
decoding complexity. It can be seen from Figure [3] that, with spectral ef- 
ficiency R = 6 bits per channel use, our code (with block number 16, i.e., 
6 = 4) improves upon orthogonal design over 10 dB at block error rate 
10 _1 when using two transmitter antennas and two receiver antennas. It 
is shown in Figure [4] that, with spectral efficiency R = 4 bits per channel 
use, our code improves upon orthogonal design about 11 dB at block error 
rate 2 x 10~ 2 when using 3 transmitter antennas and one receiver antenna. 
It can be seen from Figure [6] that, with spectral efficiency R = 3 bits per 
channel use, our code improves upon orthogonal design about 6 dB at bit 
error rate 10~ 3 when using 4 transmitter antennas and 2 receiver antennas. 
These examples demonstrate that orthogonal designs suffer from substantial 
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performance penalty. Such penalty becomes more sever when using multiple 
receiver antennas, or using more than two transmit antennas, or operating 
at high spectral efficiency. 

We compared the performance of our codes with Caley differential codes 
in Figures 3-7. Figure shows that, with spectral efficiency R = 6 bits per 
channel use, our code (with block number 16, i.e., 6 = 4) improves upon 
Caley differential code (reported in page 1495 of [E]), about 9 dB at block 
error rate 6 x 10~ 2 when using two transmitter antennas and two receiver 
antennas. The improvements of our codes with block number 4 and 8 are 
respectively 4 dB and 7 dB at block error rate 6x 10 -2 . The data of CD codes 
we used in simulation for Figures 4-7 is not available in the literature. We 
followed the design method proposed in [16] to search the corresponding CD 
codes. As described in |16J . the performance metric used in the optimization 
is the average logarithm determinant £(V). The number of data streams Q 
should be chosen as large as possible under constraint (30) of |16j . For a 
given spectral efficiency R, once Q is fixed, the set A r for {a q } is determined 
and is provided in [16]. We obtained CD codes via extensive gradient-based 
optimization. The values of tuple for the CD codes corresponding 

to Figures 4-7 are, respectively, (4,0.2610), (8,0.5832), (12,0.3619) and 
(12,0.5401). As can be seen from Figures 4-7, the performance of CD codes 
is not comparable with that of our codes. However, we can see that CD 
codes are generally better than cyclic group codes and orthogonal designs 
in terms of bit (or block) error rate performance. 

In Figures 4-6, we compared the performance of our proposed codes with 
the FPF codes proposed in [39]. It is seen from Figured] that, with spectral 
efficiency R = 4 bits per channel use, our code (with block number 16, i.e., 
6 = 4) improves upon the product of cyclic groups (see Table IV of [39] ) 
about 2 dB at block error rate 2 x 10 -3 when using 3 transmitter antennas 
and one receiver antenna. In Figure 5, our code sightly outperforms the 
product of groups code. However, our code has a lower decoding complexity 
since our code involves only 4 branches of sphere decoding, while the product 
of groups code involves 17 branches of sphere decoding. In this case, the 
T matrix is not available from [39]. We used the same diagonal elements, 
u = [1 3 4 11], as that of [39] . We searched the best T matrix based on 
the conventional criterion of diversity product maximization. We obtained 
a T matrix so that the constellation has diversity product 0.3118, which is 
greater than the previously known value, 0.3105, reported in [39]. In Figure 
6, our code significantly outperforms the product of groups code. Our code 
with 6 = 4 improves upon the product of groups code about 3 dB at bit 
error rate 10 -4 . Moreover, our code has a lower decoding complexity since 
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our code involves only 16 branches of sphere decoding, while the product 
of groups code involves 65 branches of sphere decoding. In this case, we 
used the same diagonal elements, u = [1, 14, 21, 34], as that of [39|. We 
searched the best T matrix based on the conventional criterion of diversity 
product maximization. We obtained a T matrix so that the constellation 
has diversity product 0.1563, which is greater than the previously known 
value, 0.1539, reported in [39] . 

We compared the performance of our proposed code with cyclic group 
codes in Figures 3, 5 and 6. It is demonstrated that our codes significantly 
outperform cyclic group codes. For example, Figure [5] shows that, with 
spectral efficiency R = 2 bits per channel use, our code (with block number 
4, i.e., 6 = 2) improves upon the best previously known cyclic group code 
u = [1 25 97 107] (see Table I of (23) about 3 dB at bit error rate 10" 3 when 
using 4 transmitter antennas and 2 receiver antennas. The cyclic group code 
corresponding to Figure 3 is u = [1, 1731] of diversity product 0.0265. The 
cyclic group code corresponding to Figure 6 is u = [1, 301, 1561, 1829] of 
diversity product 0.1035. We obtained these two cyclic group codes based 
on the conventional criterion of diversity product maximization. 

Specially, we have presented continuous diagonal codes for many com- 
binations of antenna numbers and constellation sizes in Table 1 of Ap- 
pendix [El These continuous diagonal codes outperform cyclic group codes 
in terms of bit error rate. For example, in Figure [71 with spectral ef- 
ficiency R = 2 bits per channel use, our continuous diagonal code A = 
[1 11.8659 404.3640 592.2112 1328.7582 1489.9040] improves upon the 
best previously known cyclic group code u = [1 599 623 1445 1527 1715] (see 
Table I of [39]) about 1.5 dB at bit error rate 10 when using 6 transmitter 
antennas and 2 receiver antennas. In Figure [71 it shown that our continuous 
diagonal code also substantially outperforms the orthogonal design and the 
CD code (with (Q,£) = (12,0.5401) as mentioned before). However, the 
product of groups code has much better performance than our continuous 
diagonal code. For the product of groups code, we used the same diagonal 
elements, u = [1, 9, 21, 51, 53, 57], as that of [39]. We searched the best T 
matrix based on the conventional criterion of diversity product maximiza- 
tion. We obtained a T matrix so that the constellation has diversity product 
0.2098, which is greater than the previously known value, 0.2084, reported 
in [39]. 

It is important to note that, since the constellation size of many types 
of FPF codes is not a power of 2, the bit assignment is not trivial and may 
significantly increase bit error rate. The first method of bit assignment is to 
truncate the constellation as a smaller one so that the size is of a power of 
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2. The drawback with this mapping method is that a large portion of the 
signal matrices may be wasted. For example, suppose we have an optimal 
(or near optimal) constellation of 240 signal matrices but only 128 of them is 
used to convey information. One can argue that it may be better to directly 
seek the optimal (or near optimal) constellation of 128 signal matrices. The 
second method is to map n consecutive bits into m consecutively transmitted 
matrices where m and n are integers large enough so that 2 n is close to the 
7?i-th power of constellation size (see, pp. 2356-2357 of [39j ) . Unfortunately, 
the bit error rate will be increased as the product of the block error rate 
and rjm where rj € (0, 1) may not be small. Moreover, the decoding delay is 
increased as mM symbol periods, which may be intolerable for large m and 
M. The increase of bit error rate and decoding delay can be substantial since 
the factor m can be quite large. For example, when the constellation size 
is 240, the minimal values of integer m to guarantee 1 < —^k < 1.05 and 
1< ^fP < 1.01 are respectively 10 and 118. It can be seen from the above 
analysis that for practical purpose the size of signal constellation should be 
a power of 2. This is one of the reasons why we choose code parameters to 
be continuous so that we can find unitary space-time constellations of any 
size. 

Finally, we would like to point out that some of the constellations we 
obtained have zero diversity product. However, these constellations signif- 
icantly outperform other constellations with much larger diversity prod- 
uct. Such constellations can be found in Appendix [E] for the following 
combinations: (i) M = N = 2, b = 2; (ii) M = N = 2, 6 = 3; (hi) 
M = 4, N = 2, b = 2. As comparing to diversity product, our com- 
putational experiments indicate that the trapezoid criterion introduced in 
Section 2 works quite well even in low SNR region. 



5 Conclusion 

We have proposed a new class of differential unitary space-time codes which 
has high performance, low encoding and decoding complexity. We have 
established a parallel sphere decoder algorithm which efficiently decodes our 
proposed code and existing codes such as cyclic group code, FPF code G m ,r, 
non-group code S miS and products of groups. We have proposed a new design 
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Figure 3: Performance simulations of constellations 
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Figure 4: Performance simulations of constellations 
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criterion and powerful optimization techniques for designing unitary space- 
time codes. We have obtained constellations which significantly improve 
upon constellations reported in the literature. 

A PROOF OF THEOREM E 

From the illustration after Theorem [2j we see that, the Chernoff bound of 
the pair-wise error probability is invariant under unitary transforms. By 
such invariant property, we have 

P(A e p ,4) = P(I,A e '- e ), 0<p<2 b -l. 

Note that 

d u (p,p,£,£') = d H (£,£') 

for < p < 2 b - 1. Hence 

2^ — 1 L 2 L—l 

E E E d*(j>, P ,£,i')p(K'4) 

P =o t=o e' =£+ i 

= 2b E E ^.O^.aM 

L-l 

= 2 " E E d u (£',£)P(I,A k ). (21) 

fc=l l'-£=k 
0<l<L-2 
l' <L-1 

It can be verified that 

£ d H (/^) = ^ d H (^ + M) 

/-<:=*, 0<£<L~k-l 

0<«<L-2 

e'<L-i 

= w(k). (22) 

By ||2D and (J22D, 

2 b — l L—2 L—l 

E E E ^(p.P^./ji'CAj,^) 

p=0 <>=0 /=^+i 
L-l 

= 2 b £ «;(fc)P(/,A fc ). (23) 
fc=i 



37 



Observing that 

d n (p,q,£,e')=d K (£,e') + d H (p,q) 

and 

P(A p A e p B p , A q A £ q B q ) = P(B P , A e 

we have 

L-l L-l 

^d H (p,g,^/)P(^AjB p , A q A e q B q ) 
e=o e'=o 

L-l L-l 

+ E E dH ^) p ( B *» a '~' b «) 

e=o e > =0 

L-l 

= E E d H (£,/) P(B p , A k B q ) 

k=—L+l e' -e=k 
o<e<L-i 
o<e' <l-i 

L-l 

+ d U (p,q) E P(fl p , A fc B g ). 
fc=-L+l 

Making use of symmetry, we can show that 

e' -i=k 
o<e<L-i 

0</ <£-l 

By dMD and (25]), 

L-l L-l 

E E^^?'^')^^. A q A* q B q ) 

L-l 

= E M\k\)+d H (p,q)]P(B p , A k B q ). 

fc=-L+l 

The proof is finally completed by invoking equations ([2]), (|23p and 
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B PROOF OF THEOREM [2] 



By virtue of the Chernoff bound ([T]) , 



1 M 

In 



o 



m=l 
MN 



1 + 



M 



2 2 -i —AT 
P °m 



4(1 + 2p) 



Y[ (a + a^) 



,m=l 



-N 



where a m is the m-th singular value of U — Let f/i and be unitary- 
matrices such that U — Q = U\ diag(o"i, • • • , ctm) U\- Then 

det[a/+(f/-$)(C/-$) t ] 
= det [al + XJx diag(<r? , ■ ■ ■ , a 2 M ) u\] 
= det[J7i diag(a + a\, ■ ■ ■ , a + a 2 M ) Jj\] 
= det(L r iL r |) det [diag(a + a\, a + a M )] 



M 



IK a+ 



(31, 



m=l 



It follows that 



a 



MN 



P(U $) = 

2(det[a/+(C/-$)([/-$)t]) iV 



from which we obtain 
P(/,<&) 



a 



M N 



2(det[aI+(I-$)(I-$)t; 

*,MN 



a 



2(det[(a + 2)J-$-$t]) 



v 



by letting U = I. This proves 
Now define 



H d = logdet[a/ + (C/-$)(C/-$) t ] 
= log det [(a + 2)/ - C/^ 1 " - . 



(27) 



(28) 



(29) 
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By the chain rule of differentiation, 

dP(U,<f>) dP(U,<S>) dl 



dv. 



pi 



dl 



dv. 







■exp(-iVH) 



di 



d~ 

-NP(U, $) 



dv. 



di 



dv. 



Similarly, 



-NP(U,&) 



pq 



as 



-'pq 



Define 

By J2ZD and ([29 



dP(U, $) 

d<t> pq 
dP(U, $) 

n d = (Q + 2)/-c/$t_ $C /t. 



(30) 

(31) 
(32) 



det(fi) = JJ (a + <r, 



2 ^ > a M > 



m=l 

for any U. Let be the M-dimensional unit column vector with a one in 
the j-th entry and zeros elsewhere. By the same method of proving (j27H . we 
can show that det[(a + 2)1 - (U + eje\5)& - ^(W + e k e]j6)] is a positive 
real number for any S E M. Since det(O) is positive and 

det[(a + 2)1 - (£7 + e^*)** - $(17* + e fc ej5)] 
= det(Q) det[J - n-^-eJ** + 9e k e])8\, 

we have that det[I — Q~^(ejel<5>> + <£efcej)5] is also a positive real number 
for any 5 G R. Therefore, 

logdet[(a + 2)1 -(U + ej el8)& - $([/ f + e k e]5)] 
= logdet(O) + logdetfi - f2~ 1 (e,-e^* t + <S>e k e])6]. 

Let * = n -1 (e J -eJ$+ + $e fc ej). Then * is a Hermite matrix, i.e., ^ = 
It follows that [SSf]kk is rea l f° r k = 1, • • • , M. By the definition of a 
determinant, we have 

A/ 

det(J -*S) = l[(l- m kk 5) + 5 2 f(5) > 
k=i 
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where /(.) is a polynomial function of 5 E R. Since det(/ — $?5) and 
[^J/cfc, k = 1,---,M are real numbers, it must be true that f(5) is also 
a real-valued function of S G R. Note that 

det(J - ®5) = 1 - (E^) 6 + + 

= 1 -tr(^)5 + 0{5 2 ) 
> 

where tr(^) is real and 0{5 2 ) is a real-valued function of 5 G R. Therefore, 
log det(J - *(5) = -tr(*)<5 + 0(5 2 ). (33) 
Making use of ([33]) . we have 

logdet[(a + 2)/ - (C7 + e je T k 5)& - + efc ej5)] 
= log det(O) - tr(tf <*) + 0(5 2 ) 
= logdet(fi) 

tr (e^-e^fi -1 ) + tr (n- 1 *e fe el)l 5 + 0(<5 2 ) 



log det(O) 



- tr (ejeKn-^yj + tr UrtefcejJ j 5 + 0(<5 2 ) 
log det(O) 

£ + 0(<5 2 ) 



- tr^Jr^efceJJ ) + tr (ir 1 ^} 
log det(fi) 



(tr ( fi-^efcel ) ) t + tr ( n~ l ^e k e] ) J + 0(<5 2 ) 



log det(fi) - 2 R(tr fi _1 $e fc eT ) «5 + 0{5 2 ) 



= logdet(n) - 2 ^([ft^^jk) 5 + 0(S 2 ) 
= logdet(fi) - 2 [^(n- 1 ^)}^ 5 + 0(8 2 ) 

for any 5 S R. Therefore, applying formula 



df{X) 
d?ft(X) 



lim 

5^0 



/(X + e^tf) - /(X) 



provided in [16] (page 1501), we have 

dz 
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= lim 

= lim 

<5^0 



logdet[ft - (ejel& + $e fc ej)<5] - logdet(Q) 
6 

-2[3?(^- 1 $)] jfc( 5 + 0(5 2 ) 



= -2[5ft(n- 1 $)] jfc . 

Observing that U = I for = (i.e., all elements of are zeros), we have 
n = (a + 2)1 - $ - $t an d = [( a + 2)1 - $ - $t]-i$ = g. Hence, 



as 



0»(17) 



= -2»(Q). 



e=o 



Similarly, 



logdet[(a + 2)1 - (U + e je T k 5i)& - $(17* - e fc eJ<Ji)] 
= log det[fi - (eje^ f - $e fc ej)<5i] 
= logdet(n) + logdet[7 - fi _1 (e.,-e£$t - $e fc eJ)<K] 
= logdet(O) - tr^-^e^e^t _ §e k e])5i] + 0{5 2 ) 
= logdet(n) 

- tr (ejel&tl- 1 } - tr (Vr^efcej) Si + 0{5 2 ) 
= logdet(fi) 

- tr (e^fi" 1 *)*) - tr (fT^e})] & + 0(,5 2 ) 
= logdet(fi) 

- tr^n-^efcej) ) - tr (jr^e}) <ft + 0(<5 2 ) 

= logdet(fi) 

- (tr (Vr 1 ^}))* - tr (V^ej)] tfi + 0(5 2 ) 

= log det(fi) - 2 3(tr (fi-^efcej) ) 5 + 0(5 2 ) 



= logdet(J)) -2 3([fi _1 $] jfc ) 5 + 0(5 2 ) 
= logdet(fi) - 2 [^(fr 1 ^)]^ 5 + 0(5 2 ) 



for any <5 G R. Therefore, 



(34) 



42 



lim 

lim • 

8^0 



logdet[0 - (e,e7$t - <5>e k e])5i\ - logdet(Q) 



for G = 0, which implies that 

d E 



d%(U) 



-29f(Q). 



(35) 



e=o 



We now consider the partial derivatives of U with respective to the el- 
ements of 0. It should be noted that an incorrect formula for computing 
has been reported in [1] (see equation (13), page 2625). In the 



au 



e=o 



sequel, we shall prove that 



dU 



,J V<i 



e=o 



e q e p 



(36) 



which is clearly different from equation (13) of [T]. To that end, we can use 
the parameterization of unitary matrix U(Q) to verify that 



au 



dU 



where 



with 



U 



PI 6=0 



'(p-l)x(p-l) 



'pq 



4>pq=0 







(p-l)x(Af-p+l) 



0(M-p+i)x( P -i) u p ' q (4> pq ,o) 



' i, 

COs(4> pq ), 

-s'm(4> P q), 
sm(4> P q), 



if j = k and j ^ {1, q — p + 1} 
if j = k and j £ {1, q — p + 1} 
if j = 1 and k = q — p + 1 
if = 1 and j = q — p + 1 
otherwise. 



Obviously, 



spq 



"I, 



if j = 1 and = 
1, if k = 1 and j = 
0, otherwise. 



p + 1 
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Hence, (|36p can be obtained by observing that 

[U] qp = [U™(<t> pq ,Q)] q -P+i,i- 

To compute other partial derivatives of U(Q) at = 0, we quote equations 
(14) and (15) of PQ as follows: 

dU 



dv,. 



i "i 



0. 



e=o 



au 



e=o 



ie k e[. 



By virtue of ([36]) and ([37 



a$t{u) 



dv pq 

d^{U) 



ae k 



e=o 



e=o 



e=o 



d$t(U) 



uyj p q 

a%{u) 



o, 



e=o 



dv. 



pq 



e=o 



0. 



d$(U) 



ae k 



e=o 



e k e J k . 



(37) 

(38) 

(39) 
(40) 



We now define inner product < ., . > by 

<x,Y> d ^j2m j k [Y] jk . 

Then by the chain rule of differentiation and equations (|34p . (138 j) . we have 
dE 



•Jpq 



dE d U(U) \ I dE d%(U)\ 



am(uy d^ pq / + \d^(u)' d<t> pq j 

dE dft(U) \ 

d$i(uy d<p pq / 

<-2»(Q), e q el-e p el) 



-2 3?«Q 



, e q e p GpCq 

= "2 M([Q]qp - [Q]pq). 

Invoking ([31]) yields 

d P(U, $) 



-np(u, 



d. 



•>\>q 



•Jpq 



2NP(U,$) «([Q]gp-[CU) 



44 



and hence proves ((3]). 

By the chain rule of differentiation and ([39 



dz _ I dz d$t(u) \ / as dsgry _ 

du pq ~\d R(C0 ' du pq / + \d $S(U) ' di 
Combing ([30]) and flU} leads to 

d P(U, $) 







and thus completes the proof of ([6]). 

By the chain rule of differentiation and ([30 



dz I dz dmjj)\ i dz dQ(u) \ 

W k ~ \dWuy~^/ + \dW~y~do^/ 
dZ d^ijjy 

d 9f(i7) ' dd k 
-2 3(C), e^) 

2 9f([Q] fcfc ). 



Hence by ([32]) . we have 



d P(U $>) dZ 

= 2NP{U,$)Z([Q] kk ) 



and completes the proof of ([5]). 
Define 

T d = log det[a/ + (A £ - *)(A* - 
By the same method as computing g-^jy, we have 

d T 

-23f(Q). 



d 3(A<) 

Observing that [A^]^ depends on A m only if j = k = m and that 

dSt({A% m ) _ g COS(^) = q 

<9A TO A _ f <9A m 
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d %([A l 



we have 



OXrr, 

d 3?(A £ ) 



d sin(^ 



A=I 



0, 



d 3(A<) 



A=I 



dX n 



A=I 



_ 2n£ 

A=I 

2tt£ t 



By the chain rule of differentiation and equations (142 j) . ([43]), we have 



9 5ft(A^)' 9A m 
dT d Q(A e 



+ 



9 S(A*) ' 9A r , 



5 3(A*)' d\ r 



-23f(Q) 



2vr^ 



"Cm 6 



-^p([e]„ 



It follows that 



9 P(A £ , $) 
c?A m , 



-NP(A e , $) 



9T 



<9A r 



3f([Q]r 



and is true. 



C PROOF OF THEOREM [3] 



First we need to prove some preliminary results. 



Lemma 1 For any £ {0, 1, • • • , L — 1}, there exists y £ S such that 

M 

^ [(C m X m i - C m (p m ) mod*C m L] 2 = \ \y G- £|| 2 . 

m=l 
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Proof. Given £ e {0, 1, • • • , L - 1}, define 

~<Pl-t 1 



Vl = « + 



L 



L. 



We claim that 



L L 

- + fi < yi < - + <pi. 



(45) 
(46) 



To prove (146 p . one can make use of the observation that 

< \x~\ - x < 1 VxeR 

and verify that inequality 



o < 



<Pi-t i 

L 2 



is equivalent to 



+ ^i < ^ + 



L 2 



< 1 



T L 

L<- + ipx. 



The truth of (j46l) allows one to choose y = [yi , • • • , Um] £ 5 such that the 
first entry of y is y%. Let 

u! = [wi,---,w M ]=i/G-(. (47) 

Obviously, to show (|4lj) . it suffices to show 

(C m A m * - C m v9 m ) mod*C m L = w m , m= 1, • • • , M 

where Ai = 1. By the definitions of G and £, we can rewrite (|47p as 

u>l = Ciyi - Ci</?i, 

w m = CmAm y x + C m L y m - C m ip m for m = 2, ■ • • , M. 
Hence, to show ([HP , it suffices to show 

(Ci^-d<^i)mod*CiL = Ciyi-Cm ( 48 ) 
and, for m = 2, • • • , M, 

{C m \ m £ ~ C m (p m ) mod* C m L 

= C m \ m y\ + C m L y m - C m (p m . (49) 
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Note that, for any £, there exits an unique integer z\ such that 
(Ci £ - Cupx) mod*CiL = d£- C x y\ + z x C x L. 
Therefore, to show (|48p . it suffices to show 

C\y\ - C X (pi = C\£- Cupi + z\C\L, 

or equivalently, 



Vi -t 
L 



(50) 



By the definition of the symmetric modulus operator mod*, integer z\ guar- 
antees 

<Ci£- C m + z 1 C 1 L < 



or equivalently, 



which implies 



-1 < 



<pi-£ i 

L 2 



Z! < 0, 



Vl-t 1 





L 2 



Since 2i is an integer, we have 



Pi - I 1 
L 2 



where the second equality follows from (f4"5j) . So equation ([18]) is proven by 
invoking (|50p . 



In light of the fact that, for any given £ £ {0, 1, • • • , L — 1} and for any 
m £ {2, • • • , M}, there exists an unique integer z m such that 

(C m A m £ — C m <p m ) mod C m L = C m (X m £ — tp m + z m L), 
to show (j49j) it suffices to prove that 

C m (X m £ — ifm + z m L) = C m \ m yi + C m L y m — C m ^p m 
for m = 2, • • • , M, or equivalently, 



-A m , m = 2,---,M. 



(51) 
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By the definition of the symmetric modulus operator mod*, integer z m guar- 
antees 



— Cm^m ^ Cm^Pm ~\~ ZmCmL < ^ 



Cm.L 



which can be rewritten as 

CmL 



C L 

Y~ ^ C m [X m (£ + z\L) - ip m + (z m - z 1 \ m )L] < — y~ , 



i.e., 



-1 < 



frn 

L 



i + ZxL 
I 



zi X, 



Zm < 



for m = 2, ■ ■ ■ , M. Therefore, 



frn 

L 



L 



zi X r , 



for m = 2, • • • , M. Since z m is an integer, we have 



frn 

L 



+ ziL \ 1 
~1—- Z1 Xm ~2 



L \L 



ZlJ A m — — 

for m = 2, • • • , M. Here (j32J) is due to ([50]) . By the definition of S, 

Am- 



l/n 



frn _ 

L \L 



Vi 


) A m -- 




yi 


-L- 




-L- 



By virtue of (|50|) and the fact that < i < L, we have 



0<^-z 1 <l, 



which leads to 



and consequently 



yi 



Zl 



Zl 



yi 

L 

Combining (EOJ, ([52]), ([53]) and ([Ml) yields 



L VL 

^m Z\\ m 

Vi -I 

Zm L 



A 7 



01 A r 



(52) 



(53) 



(54) 
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for m = 2, • • • , M. This proves (|5ip . It follows that (|49p is true and the 
lemma is thus proven. 

□ 



Lemma 2 Let y = [y 1 , • • • , y M ] £ S. If £ = y\ - |_^J L, then < £ < L 
and 

M 

^[{CmXmt-CmVm) mod* C m L} 2 = \\y G-^|| 2 . (55) 

m=l 



Proof. By the definition of 



l = y± 

L L 



L 



G [0,1) 



Hence, < I < L. Clearly, there uniquely exist integers zx,---,zm such 
that, for m = 1, • • • , M, 



(C m \ m £ — Cm^pm) mod C m L — C m \ m I + z m C m L — C m tp r: 
where Ai = 1. Therefore, to prove (|55p it suffices to show 

d 1+ zxdL - Cm = C lVl - Cm 
and, for m = 2, ■ ■ ■ , M, 

Cm^m £ z m C m L C m (p m — C m X m yi -\- C m L y m C m Lp m . 
Equation (|56p can be simplified as 

£ + ziL = yi, 
which can be further reduced to 

z\ 



yi 

L 



(56) 
(57) 
(58) 
(59) 



by invoking the definition of £. By the definition of the symmetric modulus 
operator mod*, 

CiL ~ _ CiL 

— =r- <d£ + ZidL - dvi < -7T, 
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which can be rewritten as 



99! -i 1 



or equivalently, 

I L 2 / 
By virtue of (|60p and the definition of £, 



2l 



2l 



g _ 1 

L 2 



^i _ m _ i 

L L 2 



+ 



yi 



Remember that y\ is restricted by condition 



or equivalently 
which implies 



L L 



V\ Vi i 

-xt-t-2 50 



<pi _ yi _ i 

L L 2 



0. 



Combining (fBTj) with yields and consequently proves 



(60) 



(61) 



(62) 



We now turn our attention to the proof of (|57p . By the definition of £, 
57]) can be rewritten as 



L ) + z m L = yi\ m + y m L, 



2/1 




-L. 





A m (yi 

which can be further simplified as 



2/1 
L L 



(63) 
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By the definition of the symmetric modulus operator mod*, we have 



2 — ^m-^m " i Z m L^ m Li ^"m^Pm <~ ^ j 



which can be rewritten as 

CmL 



< C m [\ m (t + Zl L) + ( Z m ZiX m )L l frn\ < 



C m L 



or equivalently, 



-1 < 



fm 

L 



+ ztL 
L 



zi A, 



Hence, 



fm 

L 



+ ziL 



zi A„ 



1 



L " I ""' 2 
Since z m is an integer, it can be determined that 

'1+zxL 



z m < 0. 



0. 



frn 

L 



L 



Zl Xr, 



(64) 



Note that (|58l) is true since ([5911 has been established. Using (j58j) . (1591) and 
AMI), we obtain 



fm 

L 



m 

L 



yi 

L J 



(65) 



Invoking ([5311 and (|65j) leads to (|63|) . This proves (1571) and the proof of the 
lemma is thus completed. 

□ 



We are now in position to prove Theorem [3l By Lemma [U we have 

M 

min V" {{CmXm t - C m ip m ) mod*C m L] 2 > min \\yG-£\\ 2 . 

i «— ' y£S 

m=l 

On the other hand, by Lemma [21 we have 
M 

min V" {{C m X m t- C m ip m ) mod* C m L} 2 < min \\yG-£\\ 2 . 

m=l 



52 



Therefore, 

M 

min V] [(C m X m £ - C m ip m ) mod*C m L] 2 = min \\yG - 
i ' yes 

m=l 

Since yi is the first entry of 

V = [Vl, ■ ■ ■ , Vm] = arg min | \yG - f | \ 2 

and £ is unique, it follows from Lemma [2] that 

M 



arg min ^ [(C m A m £ - C m ip m ) mod*C m L]' 



Vl 



m=l 



L. 



L 

The proof of Theorem [3] is thus completed. 

D PROOF OF THEOREM H 

By the definitions of G and £, 

M 



\\yG-i\\ 2 = [diyx -¥i)} 2 + ^[Cm(A m yi + Ly m - Vm )} 2 . 



m=2 

Note that 

M 



yi + Ly m - (p m )} 2 < 7 2 



m=2 

if and only if 

[C 1 (y 1 -^ 1 )] 2 < 7 2 , (66) 
[Ci(yi - Vl)? + ^[C P (A P yi + L y p - Vp )} 2 < 7 2 

p=2 

for m = 2, • • • , M. (67) 

Inequalities ([66]) and (f6T|) can be shown to be equivalent to (fT3|) and (JTSj) 
by invoking the definitions of 5 and /z m . 
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E DATA OF UNITARY SPACE-TIME CODES 

Constellation for M = N = b = 2, L = 1024, R = 6 



For q = 0,1,2,3, 



A q = diag ( exp ( ^[1 376] 



Ag=B t 



L 

= ^2x2- 



0.5192 + 0.1730i 0.7689 + 0.3305i 
0.3249 + 0.7713i -0.1692 - 0.5205i 



Bo 



Bi 



0.4772 - 0.3219i 0.0907 + 0.8127i " 

-0.1774 + 0.7983f 0.4398 + 0.3713? 

-0.4458 + 0.3772i 0.7645 + 0.2729i 

-0.6303 + 0.5115i -0.5459 - 0.2075i 



See Figure [3] for the corresponding performance simulation results. 



Constellation for M = N = 2, 6 = 3, L = 512, R = 6 



For q = 0, 7, 



A„ - diag ( rxp ( ^[1 188] 



A q — Bo — hx2- 



0.3408 + 0.6630i 
-0.2401 + 0.6218i 

0.4230 + 0.2881? 
0.8585 + 0.0319i 



-0.1400 - 0.6517i 
-0.4402 + 0.6016i 

-0.7279 + 0.4563? 
0.2226 - 0.4609i 
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B A 



B 5 



B ( , 



B 7 



0.3663 + 0.1357?' 
-0.5379 - 0.7470? 

0.7428 + 0.1845? 
-0.6130 - 0.1962?' 



0.2656 
0.3009 



0.1927? 
0.8954?' 



0.0816 + 0.8219?' 
0.3081 + 0.4722?' 



-0.0442 
-0.5320 



0.7407?' 
0.4079? 



0.8257 + 0.4069?' " 
0.1944 + 0.3388?' 

0.4221 - 0.4858?' " 
0.5391 - 0.5433? 

-0.9238 + 0.1975? ' 
-0.0304 + 0.3267? 

-0.4396 - 0.3530? " 
0.8099 + 0.1620?' 

0.5677 - 0.3564? 
-0.1131 + 0.7334? 



See Figure [3] for the corresponding performance simulation results. 
Constellation for M = N = 2, 6 = 4, L = 256, R = 6 



For q = 0, 1, • ■ ■ , 15, 



A q = diag ( exp ( —[I 75.7044] 



2m, 



A q — Bq — hx2- 



Bi 

B 2 
B 3 
B A 
B 5 ~- 
B e 



0.3912 - 0.8587?' 0.1204 - 0.3083?' 
0.2004 + 0.2635? -0.6117 - 0.7185? 

-0.1412 + 0.1279?' 0.1820 + 0.9647? 
0.7979 - 0.5718?' 0.0138 + 0.1900? 

0.4099 + 0.6855?' 0.5015 - 0.3325? 
-0.5988 - 0.0590?' 0.0412 - 0.7976?' 



0.2787 
0.8235 



- 0.3877? 

- 0.3064?' 



-0.7060 + 0.4436?' 
0.5057 - 0.2215?' 

0.5580 + 0.0014? 
0.7699 - 0.3096? 



-0.6636 - 0.5759?' " 
0.4739 + 0.0588?' 

-0.3287 - 0.4435? 
-0.3922 - 0.7358? 

-0.0343 - 0.8292? " 
0.2307 + 0.5080? 
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-0.2027 + 0.7271? 0.2565 + 0.6037? 
-0.6504 - 0.0851?' 0.6461 - 0.3904? 

-0.2526 + 0.2877?' -0.2635 - 0.8854? 
0.3749 - 0.8443?' -0.2137 - 0.3177?' 

0.2828 - 0.3666? " 
0.6780 - 0.5708? 

-0.0195 + 0.3903?' 
0.8409 + 0.3744? 

-0.4858 - 0.4989?' 
0.6347 + 0.3350? 

0.7232 + 0.5792?' 
-0.3248 -0.1899? 

-0.0346 + 0.5575? 
-0.7225 + 0.4075? 

-0.6849 + 0.3647? 
0.3313 - 0.5368?' 

-0.0364 - 0.7926?' 
-0.2606 + 0.5500?' 



See Figure [3] for the corresponding performance simulation results. 
Constellation for M = 3, N = 1, 6 = 4, L = 256, R = 4 (see Figure ED 

A q = diag ( exp ( 33.7365 58.5425] ) ) , A q = B Q = I 3x3 , q = 0, 1, 








7602 + 


1419?' 


-0 


3318 - 





3072? 





2330 


+ 


3785? " 


Bi = 


-0 


0629 - 


2186? 


-0 


1319 - 





8171? 


-0 


1144 


- 


5002? 







2379 + 


5419? 


-0 


1797 + 





2798? 





1680 


- 


7148? _ 




" -0 


1626 + 


2936?' 


-0 


2426 - 





5136? 





2881 


+ 


6941? " 


B 2 = 


-0 


2142 + 


8692?' 





3728 + 





1549? 





0315 


- 


1861? 







0007 + 


2931? 


-0 


6903 + 





1948? 


-0 


6309 


+ 


0408? _ 




" -0 


3998 + 


6106? 


-0 


6134 + 





2661? 





0346 


- 


1385? " 


B 3 = 





0378 - 


1693? 


-0 


2662 - 





0232? 


-0 


9470 


+ 


0426? 







5614 - 


3495?' 


-0 


6931 + 





0357? 





2809 


+ 


0468?' 



-0.8735 + 0.1501? 
0.4626 + 0.0203?' 

-0.8175 - 0.4231?' 
0.2927 - 0.2589?' 

-0.5913 - 0.4068? 
-0.6674 - 0.1988?' 

0.3609 + 0.1064? 
0.9174 + 0.1295? 

-0.1464 - 0.8164? 
-0.4076 + 0.3819? 

-0.0575 + 0.6282? 
0.3285 + 0.7030?' 

0.4912 + 0.3594? 
0.3456 + 0.7142? 
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-0.0076 + 0.1851i 
0.2850 - 0.6178?' 
0.2799 - 0.6515?' 

-0.0148 + 0.0411? 
0.2580 - 0.1099? 
0.9480 - 0.1438? 

0.6097 + 0.1555i 
0.4287 - 0.0217?' 
0.6463 + 0.0458?' 

-0.0706-0.1161? 
-0.4231 + 0.4203? 
0.3546 - 0.7072?' 

0.2373 + 0.6258? 
-0.0899 - 0.5506? 
-0.2728 + 0.4080? 

0.2476 + 0.3168?' 
0.4145 - 0.3141? 
0.5080 - 0.5566?' 

-0.4832 - 0.0060? 

0.2751 - 0.0456? 
-0.1269 - 0.8201? 

0.3610 + 0.4449?' 
-0.2332 + 0.7337?' 
_ -0.2399 + 0.1465?' 

-0.0346 - 0.0609?' 

0.0784 + 0.8724? 
-0.3106 - 0.3625? 

-0.0800 - 0.3611? 
-0.0151 + 0.0077?' 
-0.3292 - 0.8687? 

-0.0408 + 0.0395? 
0.8274 - 0.2735?' 
0.4822 + 0.0702?' 



0.2440 - 0.9364?' 
-0.0305 - 0.0343? 
0.1769 - 0.1737?' 

0.5416 - 0.0737?' 
-0.5039 + 0.6258? 
0.2031 - 0.1203? 

0.0383 + 0.3146? 
-0.3612 + 0.5689? 
0.1922 - 0.6392? 

0.1815 + 0.0717?' 
0.7746 + 0.1128? 
0.5745 - 0.1383? 

0.6446 - 0.3587?' 
0.5100 - 0.0798? 
-0.4228 - 0.1028? 

-0.3066 - 0.6917? 

0.6045 - 0.2448? 
-0.0430 + 0.0190? 

-0.4217 + 0.6427?' 
0.2416 - 0.2205? 
0.4829 + 0.2623? 

0.2997 + 0.4246? 
0.1987- 0.0629? 
0.4152 - 0.7170? 



-0 


6038 


+ 


0.5991?' 


-0 


1136 


+ 


0.2598? 





1769 


+ 


0.4060? 


-0 


4150 


+ 


0.0674?' 


-0 


7741 


+ 


0.4418? 





1661 




0.0354?' 





4025 


+ 


0.4897?' 





3768 


+ 


0.1065?' 


-0 


5340 




0.3997? 
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-0.1497 - 0.0827?' 
-0.2300 - 0.6943? 
0.2094 + 0.6260? 

0.1535 - 0.8221?' 
-0.2723 - 0.4490?' 
0.0806 + 0.1354? 

-0.5723 + 0.4197?' " 
0.5271 - 0.2896? 
0.1045 - 0.3518?' _ 

0.8110 + 0.5345?' " 
-0.1761 + 0.0243? 
-0.1161 - 0.1071?' _ 

-0.0844 - 0.0277?' 
-0.2909 + 0.5811? 
-0.6564 + 0.3728? 

-0.3383 - 0.3891?' " 
-0.4322 + 0.3426? 
0.4250 - 0.4993? _ 

0.4049 + 0.1075? 
0.8445 + 0.3195? 
-0.0876 + 0.0397? 

0.5036 + 0.3846? 
-0.0908 - 0.5963?' 
0.0309 + 0.4833? 

-0.4153 - 0.3148?' 
0.3898 + 0.0206?' 
0.6732 - 0.3505? 

-0.0114 - 0.8284? 
0.1714 + 0.4195?' 
0.0598 + 0.3235? 

-0.1679 - 0.7529?' 

0.2514 + 0.1554? 
-0.2202 -0.5188? 



B 



15 



0.4458 + 0.2691?' 
-0.5050 - 0.1408? 
0.1380 - 0.6595?' 



-0.7835 + 0.2801?' 
-0.4301 + 0.1993? 
-0.2277 - 0.1762?' 



-0.1008 -0.1623? 
0.1919 + 0.6809?' 
0.6115 -0.2987?' 



Constellation for M = 4, N = 2, 6 = 2, L = 64, R = 2 (see Figure EJ) 



#2 = 



#3 = 



iag 


^exp 


(£[1 5 17 2 8,)) 


3 


A q = 


B 


— -^4x4, 




q = 0, 


1,2, 


3. 









1920 


- 0.0840? 


-0.2404 


-0 


0482? 





4479 







5434? 


-0 


5535 


- 


3061? 





2506 


- 0.2836?' 


-0.0749 


+ 


0316? 


-0 


3375 


+ 





5975? 


-0 


5664 


- 


2417?' 





5003 


+ 0.4453?' 


-0.3540 


- 


4598?: 


-0 


0062 


+ 





1056? 


-0 


0564 


+ 


4475? 


-0 


5925 


+ 0.1151x 


-0.7710 


-0 


0446? 


-0 


1140 


+ 





0945? 





0034 


- 


1313? 





0820 


+ 0.1057?' 


-0.3400 


-0 


6390?' 


-0 


1703 







4908? 





3557 


- 


2486?' 


-0 


1493 


- 0.5629? 


0.0330 


+ 


1103? 


-0 


0758 


+ 





3064? 





7293 


- 


1268? 





4421 


+ 0.5263? 


0.2416 


+ 


4456? 


-0 


0185 







1039? 





4068 


- 


3064? 


-0 


3881 


- 0.1410?' 


0.3614 


+ 


2743? 


-0 


5385 







5739? 


-0 


0546 


+ 


0366?' 


-0 


0069 


+ 0.0651? 


0.5578 


+ 


1900? 


-0 


2629 







3439? 





2290 


- 


6392? 





3545 


- 0.6450? 


-0.1827 


- 


2323?' 





4409 







1617? 





0128 


- 


3878? 


-0 


1216 


+ 0.4867? 


-0.4988 


+ 


1761?' 





2493 


+ 





1850? 


-0 


1557 


- 


5899?' 





1847 


- 0.4103? 


-0.1855 


+ 


5012?' 


-0 


4816 


+ 





5137? 





0359 


- 


1220? 



Constellation for M = 4, N = 2, 6 = 4, L = 256, R = 3 (see Figure EJ 



A q = diag exp — [1 7.9761 68.6816 106.6000] , A q = B = J 4x4 , q = 0, 1, • • • , 15. 



£i = 



B 2 = 






4860 - 0.2228? 


-0 


6620 - 





2202?' 


-0 


1202 - 0.3650? 


0.0005 + 0.2823? 





2336 - 0.2148? 





2808 + 





1490? 


-0 


1188 - 0.6816? 


0.5604 + 0.0744?' 





0589 + 0.3944? 


-0 


0909 + 





0457? 





0637 - 0.5901? 


-0.4779 - 0.5000?' 





6445 - 0.1977? 


-0 


6320 + 





0494? 


-0 


0009 + 0.1460? 


0.1766 - 0.3019? 





4407 + 0.1717?' 


-0 


6272 + 





0508? 





4816 + 0.1003? 


-0.2527 + 0.2729?' 





8127 - 0.0589? 





4192 + 





0524? 


-0 


0967 - 0.3587? 


-0.0923 -0.1050? 





0371 - 0.3235? 


-0 


3105 - 





1548? 


-0 


5337 - 0.4044? 


-0.1209 + 0.5573? 





0490 - 0.0620? 





5423 + 





1058? 





1616 + 0.3814? 


-0.2759 + 0.6639? 



5S 



— U 


21)59 — 


U 


12552 





0653 


+ 





2d05z 


— 


7272 + 





A P. A 

47662 


— 


0893 — 


n 



O O £ A A 

32542 


— U 


59 7o — 





00»42 





02y2 







C\TKQA 

0/DO2 





3068 — 





02832 


U 


501 / — 


n 



53 lot 


— U 


CiK K A 

U554 — 


U 


15142 


u 


o2o5 




n 
U 


Q01 CiA 


— U 


25U1 — 


n 
U 


n a n q a 
U4Uo2 


— U 


iuiy — 


u 


not? 1 A 
U2ol2 


— 


1131 — 





73862 


— 


1193 


+ 





0915i 





2566 — 





1384i 


— 


5327 — 





2241i 





2059 - 





4636i 





6366 


+ 





2002i 





1927- 





2336i 


-0 


3935 - 





2254i 





2157- 





2034i 





0779 


+ 





15062 


-0 


0209 + 





7166i 


-0 


2077 + 





5712i 


-0 


1516- 





5863i 


-0 


3966 







247H 





0283 + 





3719i 


-0 


0641 - 





5212i 





5363 + 





0211z 


-0 


4133 







36982' 





4294 - 





2661i 


-0 


3389 + 





1852i 





1610 - 





0104i 


-0 


1781 







0118* 


-0 


3241 + 





45022 





5883 - 





53702' 


-0 


1163 - 





7159i 


-0 


4619 


+ 





04272' 


-0 


3587 + 





1576i 


-0 


2228 + 





2359i 





4255 + 





4822i 


-0 


1864 


+ 





41 732' 


-0 


2767 + 





3474i 


-0 


3281 + 





26962' 


-0 


0127 + 





1852* 


-0 


3440 







6525i 





3753 + 





4480i 





0114 + 





2825i 



0.5947 + 0.08982' 
0.3998 + 0.2454i 
0.6131 - 0.14312 
0.1429 + 0.03782' 



0.0777 - 0.05442' 
0.1581 - 0.479H 
0.2198 + 0.25812 
-0.6927 + 0.3764i 



0.3469 - 0.1757i 
0.2068 + 0.39712' 
-0.6078 + O.O6I82 
-0.0616 + 0.52082' 



-0.6284 + 0.2883i 
0.5486 + 0.15502 
0.0583 - 0.3347i 

-0.0268 + 0.2844i 






5078 


+ 





1352i 





0293 







62212 


-0 


3714 







0843i 





1771 - 





39962 


-0 


0888 


+ 





0556z 


-0 


0162 


+ 





1379i 


-0 


8660 







05702 


-0 


1509 + 





44012 





4760 







12132 


-0 


0101 







3142i 





2954 







O6882 


-0 


1032 + 





74652' 


-0 


6725 







13942' 





1043 







69512' 





0136 


+ 





09762 


-0 


1192 + 





1021z 





2398 







2323z 





2102 


+ 





25192 


-0 


0454 


+ 





76522 





4128 + 





15132 


-0 


4485 


+ 





0876i 





4478 


+ 





35152 


-0 


2440 







36682 





5225 - 





0054i 





2040 







3878i 





6345 


+ 





2146i 


-0 


1047 







115H 


-0 


5780 - 





0302i 





5155 







4710i 


-0 


0559 







3413i 


-0 


1917 







39672 





4298 - 





11822' 


-0 


0808 


+ 





1747i 





6974 


+ 





0254z 


-0 


3297 


+ 





19992 


-0 


4411 + 





36432 


-0 


2270 







15302' 





3884 


+ 





1536? 





2603 


+ 





60902 





2922 - 





4760i 


-0 


5193 


+ 





7098i 


-0 


3679 


+ 





1068z 


-0 


0637 


+ 





26362 





0287 + 





07 372' 





0413 







33592' 


-0 


3492 


+ 





2643i 





3891 


+ 





4333i 


-0 


4673 + 





36902' 



0.6160 + 0.19692 
0.2687 + 0.02042 
0.0571 - 0.42982 
0.5006 - 0.2658i 

0.7748 - 0.09262 
-0.2424 - 0.34392 
0.2561 - 0.11872' 
0.2411 - 0.27622' 



-0.4480 - 0.42192' 
0.6471 + 0.32362' 
0.0075 - 0.10402 
0.2913 + 0.04642 

-0.1322 - 0.17522 
0.3463 - 0.17982 

-0.3777 - 0.54432 
0.6003 + 0.01572 



0.1428 - 0.36122' 
-0.0287 - 0.60652 
0.0461 + 0.04622 
0.3550 + 0.5917i 

0.0966 + 0.30472' 
0.0991 + 0.73022 
0.0569 - 0.13252 
0.2016 - 0.54162' 



-0.2164 - 0.0733i 
-0.1607 - 0.09762 
0.2063 - 0.86852 
-0.3108 + 0.13792 

0.2966 + 0.39092 
-0.3426 + 0.10172' 
-0.5445 - 0.40522 
-0.3096 + 0.27402' 
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Table 1: Continuous Diagonal Code A = diag(exp(^p«)) 



1V1 


ti 


j 
L 


u 


o 
z 


1 


4 


M 1 ft 7/1 1 1 

[1 l.tw41J 


Q 
O 


1 


Q 
O 


[l l.yoo/ z.y/oyj 


4 


1 


lb 


M o nnvft £ nnftQ ft nn7ol 

[i z.yy^b o.uuoo o.yyvyj 





1 
1 


oz 


[i z.oyoo i.yioo iz.ooyo i4.io(Oj 


6 


1 


64 


[1 3.9663 5.8291 17.8483 24.6302 26.5638] 


7 


1 


128 


[1 3.9607 21.9899 31.5332 47.3852 54.2734 60.2040] 


2 


2 


16 


[1 5.9911] 


3 


2 


64 


[1 6.8881 26.5877] 


4 


2 


256 


[1 7.9761 68.6816 106.6000] 


5 


2 


1024 


[1 61.0483 100.6309 129.7491 356.4678] 


6 


2 


4096 


[1 11.8659 404.3640 592.2112 1328.7582 1489.9040] 


7 


2 


16384 


[1 300.8485 4019.3073 5142.8482 6816.8842 8098.6177 8109.4273] 



B 



12 



B\3 — 



Bu 



B\5 = 






1280 


+ 


2465i 


-0.6345 - 


0.1417* 


-0.0766 - 


0.4249* 


0.0486 - 


0.5581*' 





0816 


-0 


6158i 


0.2517- 


0.0380* 


0.0347 + 


0.2233* 


0.1347- 


0.6929* 





2862 


-0 


1410* 


-0.3924 + 


0.2196* 


0.7920 + 


0.2295* 


-0.0725 + 


0.1040*' 


-0 


2304 


-0 


6181* 


-0.5369 - 


0.1485*' 


-0.2868 + 


0.0181* 


0.1966 + 


0.3650* 





0256 


- 


1183* 


0.1067 + 


0.2697* 


0.1343 - 


0.1014*' 


0.9325 - 


0.0583*' 


-0 


5312 


-0 


2629i 


-0.6552 - 


0.4068* 


-0.0562 - 


0.1474*' 


0.1628 - 


0.0512*' 





5708 


- 


4629i 


-0.4947 + 


0.2483* 


0.1888 + 


0.1781* 


-0.1143 - 


0.2703* 





0258 


+ 


3055i 


-0.0856 - 


0.0859* 


0.8342 - 


0.4284* 


-0.0972 - 


0.0495* 





3080 


- 


2933i 


0.4685 - 


1297* 


0.2442 - 


6850* 


0.1051 - 


2072*' " 





0902 


-0 


4068i 


0.0109 + 


4103*' 


0.0850 + 


1134* 


-0.7756 - 


1903* 


-0 


1509 


+ 


2560i 


0.7121-0 


2210*' 


-0.0907 + 


4222* 


-0.1097-0 


3966* 





7444 


+ 


0551* 


0.1963 + 


0281*' 


-0.4936 + 


1449* 


-0.0483 + 


3696* _ 





2287 


+ 


2115*' 


0.4099 - 


0.4573*' 


-0.6287 + 


0.2885* 


-0.2128 - 


0.0457*' 


-0 


3966 


-0 


7004* 


-0.0114- 


0.2050* 


-0.4089 - 


0.1241*' 


0.2918 + 


0.2056* 


-0 


1676 


+ 


2912* 


0.4021 - 


0.4722*' 


0.4131 - 


0.1470*' 


0.3944 + 


0.3933* 


-0 


3760 


-0 


0294*' 


-0.1257- 


0.4246*' 


0.1711 - 


0.3427*' 


-0.7080 - 


0.1175* 
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F STRUCTURE OF ORTHOGONAL DESIGNS 



In our simulation of orthogonal designs, the frame length is chosen as T > M 
and the transmitted signals are determined as 



Sri 



xM 



hi 



0(T-Af)xM 



S T = V T S T -i, t = 1,2, • 



where S T is a T x M matrix, V T = G{z\, ■ ■ ■ , zjc) is defined by a T x T or- 
thogonal design Q such that zi, ■ ■ ■ ,zk are mapped from PSK constellations 
Ax, ■ ■ ■ ,Ak- The choice of T depends on the number transmit antennas. 
For M = 2 we choose T = 2 and use the 2x2 orthogonal design in [3j. For 
M = 3 and 4, we choose T = 4 and use the 4x4 orthogonal design in [47j . 
For M = 5, 6, 7, 8, we chose T = 8 and use the 8x8 orthogonal design 
in |47j . It should be noted that such concatenation between the complex 
square orthogonal designs and the differential unitary space-time modula- 
tion scheme has been proposed in [13] and [28]. For the spectral efficiency 
to be an integer R, we use the following PSK constellations 



1 / 2irr \ 

Ak= \ 7k exp \ j JW\) 1 r = ' 1 ''"' 2 '^ 



for 1 < k < (TRmod(K)); 

for (TR mod(K)) < k < K. For the r-th time frame, bits of length TR are 
mapped into £ Ak, k = 1, • • • , K by Gray codes. The decoding problem 
is to solve the minimization problem 

arg min \\X T - G(z 1 , ■ ■ ■ , z K )X T ^i\\p. (68) 
z k eA k , k=l,-,K 

As demonstrated in [35] , by exploiting the special structure of the orthogonal 
design, the data symbols z^ £ Ak, k = 1,---,K can be decoupled and 
decoded individually from ([55]) . 
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